7 research outputs found

    Investigating spoken emotion : the interplay of language and facial expression

    Get PDF
    This thesis aims to investigate how spoken expressions of emotions are influenced by the characteristics of spoken language and the facial emotion expression. The first three chapters examined how production and perception of emotions differed between Cantonese (tone language) and English (non-tone language). The rationale for this contrast was that the acoustic property of Fundamental Frequency (F0) may be used differently in the production and perception of spoken expressions in tone languages as F0 may be preserved as a linguistic resource for the production of lexical tones. To test this idea, I first developed the Cantonese Audio-visual Emotional Speech (CAVES) database, which was then used as stimuli in all the studies presented in this thesis (Chapter 1). An emotion perception study was then conducted to examine how three groups of participants (Australian English, Malaysian Malay and Hong Kong Cantonese speakers) identified spoken expression of emotions that were produced in either English or Cantonese (Chapter 2). As one of the aims of this study was to disambiguate the effects of language from culture, these participants were selected on the basis that they either shared similarities in language type (non-tone language, Malay and English) or culture (collectivist culture, Cantonese and Malay). The results showed that a greater similarity in emotion perception was observed between those who spoke a similar type of language, as opposed to those who shared a similar culture. This suggests some intergroup differences in emotion perception may be attributable to cross-language differences. Following up on these findings, an acoustic analysis study (Chapter 3) showed that compared to English spoken expression of emotions, Cantonese expressions had less F0 related cues (median and flatter F0 contour) and also the use of F0 cues was different. Taken together, these results show that language characteristics (n F0 usage) interact with the production and perception of spoken expression of emotions. The expression of disgust was used to investigate how facial expressions of emotions affect speech articulation. The rationale for selecting disgust was that the facial expression of disgust involves changes to the mouth region such as closure and retraction of the lips, and these changes are likely to have an impact on speech articulation. To test this idea, an automatic lip segmentation and measurement algorithm was developed to quantify the configuration of the lips from images (Chapter 5). By comparing neutral to disgust expressive speech, the results showed that disgust expressive speech is produced with significantly smaller vertical mouth opening, greater horizontal mouth opening and lower first and second formant frequencies (F1 and F2). Overall, this thesis provides an insight into how aspects of expressive speech may be shaped by specific (language type) and universal (face emotion expression) factors

    Visual vs. auditory emotion information : how language and culture affect our bias towards the different modalities

    No full text
    This study investigated if familiarity with a language that an emotion is expressed in, affects how information from the different sensory modalities are weighed in auditory-visual (AV) processing. The rationale for this study is that visual information may drive multisensory perception of emotion when a person is unfamiliar with a language, and this visual dominance effect may be reduced when a person is able to understand and extract emotion information from the language. To test this, Cantonese, English and Malay speakers were presented spoken Cantonese and English emotion expressions (angry, happy, sad, disgust and surprise) in AO, VO or AV conditions. Response matrices were examined to see if patterns of responses changed as a function of whether the expressions were produced in their native or non-native language. Our results show that the visual dominance effect for Cantonese and Malay participants changed depending on the language an emotion was expressed in, while the English participants showed a strong visual dominance effect regardless of the language of expression

    Exploring acoustic differences between Cantonese (tonal) and English (non-tonal) spoken expressions of emotions

    No full text
    It has been claimed that tone language speakers use less F0 related cues in the production of verbal expressions of emotions. This is because F0 is used in the production of lexical tones. This study investigated this claim by examining how F0 and various other acoustic parameters are used in the production of verbal emotion expressions in Cantonese (tone language) compared to English (non-tone language). Acoustic measurements (e.g., mean F0, F0 range) were extracted from the verbal expressions of five emotions (angry, happy, sad, surprise and disgust) and a neutral expression produced by five male native speakers of Cantonese and English. They were analyzed using Kmeans clustering to see how different acoustic properties are grouped and how this varies as a function of language. The results showed some difference between the two languages in how F0 related cues are used in the production of emotions. The results are discussed in terms of the general acoustic characteristics of spoken emotion expressions and in relation to behavioral data from perceptual studies

    The sound of disgust : how facial expression may influence speech production

    No full text
    In speech articulation, mouth/lip shapes determine properties of the front part of the vocal tract, and so alter vowel formant frequencies. Mouth and lip shapes also determine facial emotional expressions, e.g., disgust is typically expressed with a distinctive lip and mouth configuration (i.e., closed mouth, pulled back lip corners). This overlap of speech and emotion gestures suggests that expressive speech will have different vowel formant frequencies from neutral speech. This study tested this hypothesis by comparing vowels produced in neutral versus disgust expressions. We used our database of five female native Cantonese talkers each uttering 50 CHINT sentences in both a neutral tone of voice and in disgust to examine five vowels. Mean fundamental frequency (F0) and the first two formants (F1 and F2) were calculated and analysed using mixed effects logistic regression. The results showed that the disgust vowels showed a significant reduction in either or both formant values (depending on vowel type) compared to neutral. We discuss the results in terms of how vowel synthesis could be used to alter the recognition of the sound of disgust

    The effect of spectral profile on the intelligibility of emotional speech in noise

    No full text
    The current study investigated why the intelligibility of expressive speech in noise varies as a function of the emotion expressed (e.g., happiness being more intelligible than sadness), even though the signal-to-noise ratio is the same. We tested the straightforward proposal that the expression of some emotions affect speech intelligibility by shifting spectral energy above the energy profile of the noise masker. This was done by determining how the spectral profile of speech is affected by different emotional expressions using three different expressive speech databases. We then examined if these changes were correlated with scores produced by an objective intelligibility metric. We found a relatively consistent shift in spectral energy for different emotions across the databases and a high correlation between the extent of these changes and the objective intelligibility scores. Moreover, the pattern of intelligibility scores is consistent with human perception studies (although there was considerable individual variation). We suggest that the intelligibility of emotion speech in noise is simply related to its audibility as conditioned by the effect that the expression of emotion has on its spectral profile

    Disgust expressive speech : the acoustic consequences of the facial expression of emotion

    No full text
    This study investigated how the facial expression of disgust may affect the acoustics of speech. In terms of a pathogen avoidance mechanism, the expression of disgust would seem to require speech to be produced with a smaller mouth opening than neutral speech, hence lowering the formant frequencies. This hypothesis was tested by comparing how lip configuration (i.e., height, width and size of the lip area), fundamental frequency (F0) and the formants (F1 and F2) of the vowels changed when produced in neutral or disgust expressions. The vowels were extracted from 50 Cantonese sentences spoken by 10 (5 male) talkers; produced once in disgust and once more in a neutral tone of voice. The results support the notion that the facial expression of emotions may have a role in shaping the acoustic properties of the vocal expressions of emotions. Mixed effects logistic regression models revealed that in disgust, vowels were produced with lower lip height, lower F1, F2, and higher F0 than neutral speech

    The effect of expression clarity and presentation modality on non-native vocal emotion perception

    No full text
    The current study investigated how the presentation of visual information and the clarity of expressions would influence this non-native effect. Australian English and Cantonese native listeners were presented spoken Australian English sentences produced by actors who had very clear or ambiguous emotional expressions (levels of clarity were established in another study). Angry, happy, sad, surprise or disgust expressions were tested in auditory only (AO), visual only (VO) and audio-visual (A V) conditions. The results showed the expected non-native disadvantage for AO presentation; with the Cantonese speaker's performance significantly less accurate than the English ones. There was also the expected difference as a function of the clarity of the emotion expression; this effect was the same magnitude across the language groups. This was not the case in the VO or A V conditions where performance levels did not differ. This indicates that visual cues helped the Cantonese listeners compensate for poorer AO recognition
    corecore